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ABSTRACT 

We consider a large-scale service system model proposed 
in [14], which is motivated by the problem of efficient place- 
ment of virtual machines to physical host machines in a net- 
work cloud, so that the total number of occupied hosts is 
minimized. Customers of different types arrive to a sys- 
tem with an infinite number of servers. A server packing 
configuration is the vector fc = {ki}, where ki is the num- 
ber of type-i customers that the server "contains". Packing 
constraints are described by a fixed finite set of allowed con- 
figurations. Upon arrival, each customer is placed into a 
server immediately, subject to the packing constraints; the 
server can be idle or already serving other customers. After 
service completion, each customer leaves its server and the 
system. 

It was shown in [14] that a simple real-time algorithm, 
called Greedy, is asymptotically optimal in the sense of min- 
imizing 5]]^, X^^™ in the stationary regime, as the customer 
arrival rates grow to infinity. (Here q > 0, and Xk de- 
notes the number of servers with configuration k.) In par- 
ticular, when parameter a is small. Greedy approximately 
solves the problem of minimizing ^^X^, the number of 
occupied hosts. In this paper we introduce the algorithm 
called Greedy with sublinear Safety Stocks ( GSS ), and show 
that it asymptotically solves the exact problem of minimiz- 
ing Xk- An important feature of the algorithm is that 
sublinear safety stocks of X^. are created automatically - 
when and where necessary - without having to determine 
a priori where they are required. Moreover, we also pro- 
vide a tight characterization of the rate of convergence to 
optimality under GSS. The GSS algorithm is as simple as 
Greedy, and uses no more system state information than 
Greedy does. 

Categories and Subject Descriptors 

[Network Services]: Cloud Computing; [Probability 
and Statistics]: Markov Processes, Queueing Theory, Stochas- 
tic Processes; [Design and Analysis of Algorithms]: 
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1. INTRODUCTION 

We consider a service system model [14] motivated by the 
problem of efficient placement of virtual machines (VMs) to 
physical host machines (servers) in a data center (DC) [6]. 
A service policy decides to which server each incoming VM 
will be placed. We are interested in service policies that 
minimize the total number of occupied servers in the system. 
It is further desirable that the policy be simple, so that 
placement decisions are made in real time, and depend only 
on the current system state, but not on system parameters. 

Consider the following description of a DC. It consists of 
a number of servers. While servers may potentially have dif- 
ferent characteristics, in this paper we assume that they are 
all the same. More specifically, let there be different types 
of resources (for example, type-1 resource can be CPU, type- 
2 resource can be memory, etc). For each n G {1, 2, . . . , A''}, 
a server possesses amount B„ > of type-n resource. / 
types of VMs arrive in a probabilistic fashion, and request 
services at the DC. Arriving VMs will be placed into the 
servers, occupying certain resources. More specifically, for 
i G {1,2, ...,/}, a type-i VM requires amount > 
of type-71 resource during service, where n € {1, 2, . . . , A''}. 
Once a VM completes its service, it departs the system, 
freeing up corresponding resources. We assume that service 
times of different VMs are independent. 

For each i £ {1, 2, . . . , /}, let ki be the number of type-i 
VMs that a server contains. Then the following vector pack- 
ing constraints must be observed at all times. Namely, a 
server can contain ki type-i VMs (i G {1, 2, . . . , /}) simulta- 
neously if and only if 

^fc,6i,„<B„, (1) 

i 

for each n £ {1,2, ...,A}. In this case, the vector k = 
{ki, . . . , ki) is called a server configuration. 



The model considered in this paper is similar to the DC 
described above, but different in the following two aspects. 

1. While vector packing constraints (cf Eq. ([T])) arise 
naturally in the context of VM placement, we make the 
more general assumption of so-called monotone pack- 
ing constraints (cf. Section [2. l|l in our model. 

2. We consider a system with an infinite number of servers, 
where incoming VMs will be immediately placed into 
a server. For large-scale DCs, the number of servers is 
not a bottleneck, hence an infinite-server system rea- 
sonably approximates such DCs. 

We would also like to remark that an important assumption 
of our model is that the service requirement of a VM is 
not affected by potentially other VMs occupying the same 
server. This is a reasonable modeling assumption for multi- 
core servers, for example. 

There can be different performance objectives of interest. 
For example, we may be interested in minimizing the to- 
tal energy consumption [6], or maximizing system through- 
put 13 . In this paper, we are interested in minimizing the 
total number of occupied servers. These objectives are differ- 
ent but related. For example, by switching off idle servers, 
or keeping them in stand-by mode, we can reduce energy 
consumption by minimizing the number of occupied servers. 

In the main results of the paper, we introduce the pol- 
icy called Greedy with sublinear Safety Stocks (GSS ), and 
show that it asymptotically minimizes the total number of 
occupied servers in steady state, as the input flow rates of 
VMs grow to infinity. GSS is a simple policy that makes 
placement decisions in real time, and based only on the cur- 
rent system state. Informally speaking, GSS places incom- 
ing VMs in a way that greedily minimizes a Lyapunov func- 
tion, which asymptotically coincides with the total number 
of occupied servers. GSS maintains non-empty safety stocks 
at every server configuration k whenever Xk becomes "too 
small", so as to allow flexibility on VM placement. In other 
words, under GSS, there is a non-zero number of servers of 
every conflguration, so that an incoming VM can potentially 
be placed into a server with any conflguration. These safety 
stocks correspond to the discrepancy between the Lyapunov 
function and the total number of occupied servers, and grow 
"sublinearly" with the input flow rates. We also provide a 
characterization of the rate of convergence to optimality un- 
der GSS, which is tighter than the conventional fluid-scale 
convergence rate. 

1.1 Related Works 

In this section, we discuss related works, and put our re- 
sults in perspective. 

The most closely related work is [14], where the model 
considered in this paper was proposed, and a related prob- 
lem was studied. In both this paper and [14], the asymptotic 
regime of interest is when the input flow rates grow to in- 
flnity, and the system is considered under the fluid scaling, 
i.e., when the system states are scaled down by the input 
flow rates. In [TJ, the problem of interest is minimizing 
Sfe ^fc^*^' where a > 0, and Xk is the number of occupied 
servers with conflguration fc. A simple policy called Greedy 
was introduced, which asymptotically minimizes the sum 
X^te -^k^^y fo'' a > 0, in the stationary regime. Policies 
Greedy and GSS differ in two important aspects. First, they 



try to minimize different objectives - Xl,'^" (a > 0) and 
5^j,Xfe, respectively. When a > is small. Greedy approx- 
imately solves the problem of minimizing the total number 
of occupied servers Xk, in the asymptotic regime where 
the input flow rates grow to inflnity, and at the fluid scale. 
However, if minimizing X^te^fe ^-he "true" desired objec- 
tive. Of > need to be chosen carefully, depending on the 
system scale (input flow rates), which may be difficult to 
do. Therefore, we believe that asymptotically solving the 
exact problem of minimizing J]]^ Xk is of substantial inter- 
est. Moreover, the policy GSS proposed in this paper is 
as simple as Greedy, and uses no more system state infor- 
mation than Greedy does. Second, at a technical level, to 
prove the asymptotic optimality of Greedy, [14] considered 
only the fiuid scaling and the corresponding fiuid limits. In 
this paper, to prove the asymptotic optimality of GSS, it is 
no longer sufficient to consider the fiuid-scale system behav- 
ior alone; a local fluid scaling is also considered, needed to 
study the dynamics of safety stocks. In addition, this allows 
us to derive a tighter characterization of the rate of conver- 
gence to optimality under GSS, as opposed to the fluid-scale 
convergence shown in [TJ for Greedy. 

On a broader level, the model considered in this paper 
is related to the vast literature on classical stochastic bin 
packing problems. In a bin packing system, random-sized 
items arrive, and need to be placed into flnite-sized bins. 
The items do not leave or move between bins, and a typ- 
ical objective is to minimize the number of occupied bins. 
A packing problem is one-dimensional if sizes of the items 
and bins are captured by scalars, and multi- dimensional if 
they are captured by vectors. Problems with the multi- 
dimensional packing constraints ([T| are called vector pack- 
ing. For a good review of one-dimensional bin packing, see 
for example [2] , and see for example [T] for a recent review of 
multi-dimensional packing. In bin packing service systems, 
items (customers) arrive at random times to be placed into 
a bin (server), and leave after a random service time. The 
servers can process multiple customers as long as packing 
constraints are observed. Customers get queued, and a typ- 
ical objective of a packing algorithm is to maximize system 
throughput. (See for example [J for a review of this line 
of work.) Our model is similar to the latter systems, ex- 
cept there are multiple bins (servers) - in fact, an infinite 
number in our case. Models of this type are more recent 
(see for example, [8j[9]). [8] addresses a joint routing and 
VM placement problem, which in particular includes pack- 
ing constraints. The approach of [8] resembles Markov Chain 
algorithms used in combinatorial optimization. [S] consid- 
ers maximizing throughput of a queueing system with a 
finite number of bins (servers), where VMs can wait for 
service. Very recently, [7] has new results on the classical 
one-dimensional online bin packing; it also contains heuris- 
tics and simulations for the corresponding system with item 
departures, which is a special case of our model. 

As mentioned earlier, we consider the asymptotic regime 
where the input flow rates scale up to inflnity. In this re- 
spect, our work is related to the (also vast) literature on 
queueing systems in the many servers regime. (See e.g. [T^] 
for an overview. The name "many servers" reflects the fact 
that the average number of occupied servers scales up to 
infinity as well, linearly with the input fiow rates.) How- 
ever, packing constraints are not present in earlier works 
(prior to [14) ) on the many servers regime, to the best of 



our knowledge. 

The idea of maintaining sublinear safety stocks to increase 
system flexibility, and hence avoid "resource" starvation - 
the approach taken by GSS, the policy proposed in this pa- 
per - has also appeared in other works. For example, see [TU] 
and the references therein for an overview. However, to 
the best of our knowledge, the following feature of GSS is 
novel, and has not appeared in algorithms proposed in ear- 
lier works. Namely, GSS creates safety stocks automatically, 
in the sense that it does not require a priori knowledge of the 
subset of configurations for which the sublinear safety stocks 
need to be maintained. As a result, GSS does not require 
any a priori knowledge of the system parameters, because 
the safety stocks automatically adapt to parameter changes. 
We remark that the policy Greedy proposed in [TJ also cre- 
ates safety stocks, but they scale linearly with the input flow 
rates, whereas GSS creates sublinear safety stocks. 

Finally, an overview of some resource allocation issues that 
arise from VM placement in the context of cloud computing 
can be found in [6]. 

1.2 Organization 

The rest of the paper is organized as follows. In Section 
11.31 we introduce the notation and conventions adopted in 
this paper. The precise model and main results are described 
in Section [21 The model is introduced in Section [2.11 Here 
we describe two versions of the model, the closed and open 
system. In Section[22] we describe the asymptotic regime of 
interest. The GSS policy is described in Section r2.3l and the 
main results. Theorems [6] and [T] are stated in Section 12.41 
for the closed and open system, respectively. Sections [3] and 
[4] are devoted to proving Theorems [6] and [T] respectively. 
A discussion of the results in this paper and some future 
directions is provided in Section [5] 

1.3 Notation and Conventions 

Let R be the set of real numbers, and let R+ be the set 
of nonnegative real numbers. Let Z be the set of integers, 
let Z+ be the set of nonnegative integers, and let N be the 
set of natural numbers. R" denotes the real vector space 
of dimension n, and R" denotes the nonnegative orthant of 
R". Z" and Z+ are similarly deflned. We reserve bold let- 
ters for vectors, and plain letters for scalars and sets. For a 
scalar x, let \x\ denote its absolute value, and let \x] denote 
the largest integer that does not exceed x. For two scalars 
X and y, let X A y — mm{x,y}, and let a; V y = ma.x{x,y}. 
For a vector x = {xi)f^i £ R", let ||a;|| denote its 1-norm, 
i.e., ||a;|| = "^"^i The distance from vector x G R" to 
a set [/ C R" is denoted by d{x, U) = infugc/ \\x — u\\. We 
use Si to denote the i-th standard unit vector, with only the 
ith component being 1, and all other components being 0. 
For a set A/", let Iat be the indicator function of A/". For 
a finite set A/", let \J\f\ be its cardinality. For two sets A/" 
and A4, let J^\A4 denote the set difference of A/" and A4, 
i.e., JV\M = {x € JV : X ^ M}. For a set A/" C R", let 
(A/") denote its convex hull, i.e., the set of all x € R" such 
that there exist 71, ... , 7„i £ R+ and vi, . . . , Vm £ A/" with 
X = X^^i "fjVj and 7j = 1- Symbol — >• means or- 

dinary convergence in R", and denotes convergence 

in distribution of random variables taking values in R", 
equipped with the Borel cr-algebra. The abbreviation w.p.l 
means convergence with probability 1. We often write x{-) to 
mean the function (or random process) {x{t), t > 0}. We 



write iff as a shorthand for "if and only if", i.o for "infinitely 
often", LHS for "left-hand side" and RHS for "right-hand 
side". We also write WLOG for "without loss of general- 
ity", w.r.t for "with respect to", and u.o.c for "uniformly on 
compact sets". 

Throughout this paper, if x{-) is a random process (which 
in most cases will be Markov), we will denote by a;(oo) its 
random state when the process is in stationary regime; in 
other words, x(oo) is equal in distribution to x{t) (for any t) 
when x{-) is stationary. We use the terms steady state and 
stationary regime interchangeably. 

2. MODEL AND MAIN RESULTS 

2.1 Infinite Server System with Packing Con- 
straints 

We consider the following infinite server system that evolves 
in continuous time. There are / types of customers, indexed 
by i £ {1, 2, ...,/}= X, and an infinite number of homoge- 
neous servers. A server can potentially serve more than one 
customer simultaneously. We use A; = (fci, k-z, . . . , fcj) £ 
an /-dimensional vector with nonnegative integer compo- 
nents, to denote a server configuration. The general packing 
constraints are captured by the finite set K. C 1/+ of feasi- 
ble server configurations. Thus, a server can simultaneously 
serve ki customers of type i, i £ I, iff fc = (fci, kz, . . . , kj) £ 
IC. From now on, we drop the word "feasible", and simply 
call IC the set of server configurations. 

In this paper, we assume that the set IC is monotone. 

Assumption 1. IC is monotone in the following sense. 
If k £ IC, and k' £ has k' < k component-wise, then 
k' E IC as well. 

A simple consequence of the monotonicity assumption is 
that £ K!. We now let IC = K!\{0} denote the set of 
non-zero server configurations. 

Vector Packing is Monotone. An important example 
of monotone packing is vector packing. Consider the vec- 
tor packing constraints in ([T}. It is clear that if the server 
configuration fc = satisfies fl}, and if fc' < fc 

component- wise, then fc' also satisfies ([TJ. On the other 
hand, not all monotone packing is vector packing. For ex- 
ample, when / = 2, £ = {(0, 0), (0, 1), (0, 2), (1, 0), (2, 0)} 
is monotone, but is not described by vector packing con- 
straints. In the sequel, we will only assume monotone pack- 
ing in our model, and all our results hold under this general 
setting. 

To exclude triviality, we also assume that for all i £ I, 
(the i-th standard unit vector) is an element of IC. 

As discussed in the introduction, we make the following 
important assumption in this paper. We assume that si- 
multaneous services do not affect the service distributions 
of individual customers; in other words, the service time of 
a customer is unaffected by whether or not there are other 
customers served simultaneously by the same server. Let 
us also remark that ideally, we would like to consider an 
open system, where each arriving customer is immediately 
placed for service in one of the servers, and leaves the system 
after service completion. However, we will first consider a 
"closed" version of this open system. The reason is twofold. 
First, the analysis of the closed system is a stepping stone 



to that of the open system, and illustrates the main ideas 
more clearly. Second, we will see shortly that the closed sys- 
tem can be used to model job migration in a cloud, and is 
therefore of independent interest. 

Denote by Xk the number of servers with configuration 
fc G A3. The system state is then the vector X — {Xh, k G 
/C}. By convention, Xo = at all times. 

Closed System. Here we describe the "closed" version 
of the model. Let r G N be given. Suppose that there 
are in total r customers in the system, and no exogenous 
arrivals. For each i G I, we suppose that there are p^r 
customers of type i in the system at all times. This in par- 
ticular implies that X^igiPi = 1- is convenient to index 
the system by r its total number of customers, and we use 
= {XJ^, k £ IC) to denote a system state. The system 
evolves as follows. Each customer is almost always in service, 
except at a discrete set of time instances, where it migrates 
from one server to another (possibly the same one), subject 
to the packing constraints imposed by IC. For a customer, 
the time between consecutive migrations is called its service 
requirement. Thus, one can alternatively think of a customer 
as departing the system after its service requirement, and 
then immediately arriving to the system, to be placed into a 
server. For each i, we assume that the service requirements 
of type-i customers are i.i.d. exponential random variables 
with mean and that the service requirements are in- 

dependent across different i £ X. A (Markovian) service 
policy ("packing rule") decides to which server a customer 
will be placed after its service requirement, based only on 
the current system state X^ . A service policy has to observe 
the packing constraints. Under any well-defined service pol- 
icy, the system state at time t, X^{t), is a continuous-time 
Markov chain on a finite state space. Hence, for each r, the 
process {X^{t), t > 0} always has a stationary distribution. 

Open System. In the open system, customers of type i 
arrive exogenously as an independent Poisson fiow of rate 
Air, where A; is fixed and r is a scaling parameter. Each 
arriving customer has to be placed for service immediately 
in one of the servers, subject to the packing constraints im- 
posed by IC. Service times of all customers are independent. 
Service time of a type-i customer is exponentially distributed 
with mean 1/^J.i. After a service completion, each customer 
leaves the system. If we denote pi = Xi/^i, then in steady 
state, the average number of type i customers in the system 
is pir, and the average total number of customers is pir. 
We assume, WLOG, that . p; = 1 - this is equivalent to 
re-choosing the value of parameter r, if necessary. A (Marko- 
vian) service policy ("packing rule") in this case decides to 
which server an arriving customer will be placed, based only 
on the current system state. A service policy has to observe 
the packing constraints. Similar to the closed system, we let 
XI, (t) denote the number of servers with configuration k at 
time t in the rth system. However, for the policy that we 
will study, X'^{t) = {X],{t))k£K. will not be a Markov pro- 
cess. We postpone the discussion of a complete Markovian 
description of the system and the existence of the associated 
stationary distribution to Section [2.3.21 

2.2 Asymptotic Regime 

We are interested in finding a service policy that mini- 
mizes the total number of occupied servers in the stationary 



regime. The exact problem is intractable, so instead we con- 
sider asymptotically optimal service policies. For both the 
closed and open systems, the asymptotic regime of inter- 
est is when r — >■ oo. Informally speaking, in this limit, the 
fluid-scaled system state satisfies a conservation law (cf. Eq. 
Q), and the best that a policy can do is solving a linear 
program, subject to this conservation law. We now describe 
the asymptotic regime in more detail. 

First, we defined the so-called fluid scaling. Recall that 
both the closed and open systems are indexed by r, and 
X^{t) is the vector that denotes the numbers of servers at 
time t, in the rth system. The fluid scaled process is x^{t) = 
X^{t)/r. For each r, in the closed system, X'^{-) has a (not 
necessarily unique) stationary distribution, so x^{-) also has 
a stationary distribution. We will see shortly that in an open 
system, X^{-) also has a stationary distribution (see Lemma 
[S}. Denote by X^{oo) and a;'^(oo) the random states of the 
corresponding processes in a stationary regime. (Recall the 
convention in Section [1.31 ) 

We now argue that as r — >■ co, 

kiX^{oo) Pi, for all i. (2) 

In a closed system, for each i £l, there are ptr customers 
of type i in the system at all times, so on all sample paths, 

kix]^{t) — pi, for all r, t and i. 

This implies that the same holds for x^'{oo). In an open sys- 
tem, the total number of type-i customers is X^^g^ fciX^(oo), 
in steady state. It is easy to see that, independent from the 
service policy, this quantity is a Poisson random variable 
with mean pir. Thus, as r — >■ oo, X^fegK kixl,{oo) pi. 
Now consider the following linear program (LP). 

Minimize Xk (3) 

feeK 

subject to kiXk = Pi, for all i G X, (4) 

Xfe > 0, for all k £ IC. (5) 
Denote by X the set of feasible solutions to LP: 
X = {x £ m^^^ : hXk = Pi, i G I}. 

Then X is a. compact subset of R'j""' . Let X* denote the set of 
optimal solutions of LP, and let u* denote its optimal value. 
In light of Eqs. ((2)1 and (|4]), a service policy is asymptotically 
optimal if, roughly speaking, under this policy and for large 
r, X^fegK •'^fe(°°) ~ "*^ith high probability (cf. Theorems|6] 
andQ. 

The following characterization of the set X* hy dual vari- 
ables will be useful. The proof is elementary and omitted. 

Lemma 2. x — {xk)keK. G X* iff x is a feasible solution 
of LP, and there exist rji £ i £ X, such that 

(i) X^igi^*'?* — 1 /o'' "-^^ k £ IC, and 

The following lemma relates the distance between a point 
X £ X and the optimal set X* to the objective value of LP 
evaluated at x. 



Lemma 3. There exists a positive constant D > 1 such 
that for any x £ X , 

D [ ^ Xfe -u* j >d{x,X*). 

Note that D > 1 is necessary, since for every x £ X, 
d{x, X*) > J2keic ~ 

Proof. See Appendix|Xl □ 

2.3 Greedy with sublinear Safety Stocks (GSS) 

Now we introduce the service policy, Greedy with sublinear 
Safety Stocks (GSS), along with a variant, which we will 
prove to be asymptotically optimal. 

2.3.1 GSS Policy in a Closed System 
GSS. Let p G (5,1). For a given r, define a weight function 
lu'' : R+ R+ to be w''{X) = f A ^. Let X denote 
the set of all pairs (fc, i) G AC x X such that k £ IC and 
k - e, G K.. Given X = {Xk',k' G K.} and {k,i) G M, 
define A[fe_,)(X) = u-" (Xfc) - ■u;"(Xfe_eJ. Under GSS, a 
customer of type i is placed into a server with configuration 
k — Bi where X^-e^ > or fc — = 0, such that Aj^, ) 
is minimal. Ties are broken arbitrarily. 

Note that the GSS policy makes decisions based only the 
current system state. The parameter r which it uses is noth- 
ing else but the total number of customers in the system, 
which is, of course, a function of the state, and which hap- 
pens to be constant in the closed system. 

We now provide an intuitive explanation of the policy. Let 
f^ be the anti-derivative of , so that 



fix) 



2£l 

2rP' 

A - — , 



if X G [0,rP]- 



Let F''(X) = J^keKfiXh)- Then w'' and A^^ capture 
the first-order change in . Suppose that the current sys- 
tem state is = {Xh)heK.- Then, placing a type-i customer 
into a server with configuration k — Si only changes Xh-e^ 
and Xk'. Xk-e^ decreases by 1 (if Xk-e^ > 0), and Xk 
increases by f . Thus, the first-order change in is 



In this sense, GSS decreases f greedily, by placing a cus- 
tomer into a server that results in the largest (first-order) 
decrease in f. 

The next lemma states that F^{X) only differs from Jj^^, Xk 
by 0{r^). The proof is straightforward and omitted. 



Lemma 4. For any X G 
keK 



< F'-(X) < ^ Xfe 

keK. 



Under the fiuid scaling described earlier, the difference 
O(r^) between F^{X) and X^fegK"'^*' becomes negligible, as 
it is of order o(r). Thus, for a fluid-scaled process, min- 
imizing F^{X) (what GSS tries to do) is "equivalent" to 
minimizing "^Zkeic -^k, when r is large. 

2.3.2 GSS Policy in an Open System 

First, we describe the "pure" GSS policy. 



GSS. Let p G (|,1)- For a given system state X, let 
Z = Z{X) denote the total number of customers in the 
system. For a system with parameter r, define a weight 
function w^'iX) = ■W{X;Z) as follows: w''{X) = 1 A ^. 
(Note that 'W^{X) generalizes the corresponding weight func- 
tion w'^{X) — 1 A ^ for the closed system, because in the 
closed system with parameter r the total number of cus- 
tomers is constant Z = r.) Let M denote the set of all pairs 
(fc, i) £ IC X X such that k £ K, and k — e.i £ K,. Given 
X = {Xk',k' £ K.} and (fc,i) G X, define A[fc,)(X) = 
{Xk) — (Xk-e^)- Under GSS, an arriving customer of 
type i is placed into a server with configuration k — Bi where 
Xk-£i > or fc — fii = 0, such that A^k.i)[X) is minimal. 
Ties are broken arbitrarily. 

In this paper, for the open system, we will analyze not the 
"pure" GSS policy, described above, but its slight modifica- 
tion, called Modified GSS [GSS-M). 

GSS-M. Under this policy, a token of type i is generated 
immediately upon each service completion of type i, and 
is placed for "service" immediately according to GSS. The 
system state X = {Xk,k £ K,} account for both tokens of 
type i as well as actual type-i customers for all i £X. Each 
arriving type i customer first seeks to replace an existing to- 
ken of type i already in "service" (chosen arbitrarily), and if 
there is none, it is placed for service according to GSS. Each 
token that is not replaced by an actual arriving customer be- 
fore an independent exponentially distributed timeout with 
mean l//io, leaves the system. (This modification is the 
same as the one introduced in [14] for the Greedy algorithm, 
to obtain the Greedy-M policy.) 

We emphasize that GSS and GSS-M do not require the 
knowledge of parameter r. 

Since the system evolution under the GSS-M involves both 
actual customers and tokens, we need to define the Markov 
chain describing this evolution more precisely. A complete 
server configuration is defined (in the same way as in [14p 
as a pair {k,k), where vector fc = (ki, . . . ,ki) £ K, gives 
the numbers of all customers (both actual and tokens) in 
a server, while vector k < k, k £ K, gives the numbers of 
actual customers only. The Markov process state at time t 
is the vector {X^^^ fe)(^)}' "^here the index (k,k) takes val- 
ues that are all possible complete server configurations, and 
superscript r, as usual, indicates the system with parameter 
r. Note that X^{t) = {X^(t), fc G K,} can be considered as a 
"projection" of {X[^^ ^^(t)}, with Xi{t) = T,k:k<k X^,^,!,) 

each k £ JC. Let y/(t), Yi''{t), and Y[{t) = Y[{t) + Y[{t) 
denote the total number of actual type-i customers, the to- 
tal number of type-i tokens, and the total number of all 
(both actual and tokens) type-i customers in the rth sys- 
tem, respectively. The total number of actual customers of 
all types is then Z^{t) = Y[{t). The behaviors of the 
processes {{Yj^{t),Y[{t)), t > 0}, are independent across all 
i, with Yj^{oo) having Poisson distribution with mean pir. 
The following fact has the same proof as Lemma 11 in [14] . 

Lemma 5. The Markov chain {X^^^ te)(*)}i i > 0, is irre- 
ducible and positive recurrent for each r. 

Remark. Informally, the reason (which is the same as 
in [13]) for considering a modified version of GSS instead 
of pure GSS in an open system is as follows. Recall that in 



a closed system, a customer migration can be also thought 
of as its departure followed immediately by an arrival of the 
same type. As such, departures and arrivals in a closed sys- 
tem are perfectly "synchronized", which in particular means 
that in a closed system, for every departing customer, we al- 
ways have the option of putting it right back into the server 
which it has just departed from. This means that a greedy 
control, pursuing minimization of a given objective function, 
cannot possibly increase (up to a first-order approximation) 
the objective function at every customer migration. In con- 
trast, in an open system, departures and arrivals are not 
synchronized. Therefore, it is not immediately clear that a 
greedy algorithm will necessarily improve the objective. The 
tokens are introduced so that, informally speaking, the de- 
cisions on placements of new type-i arrivals are made some- 
what "in advance", at the times of prior type-i departures. 
In this sense, the behavior of an open system "emulates" that 
of a corresponding closed system. 

2.4 Main Results 

Theorem 6. Let p £ For each r, consider the 

closed system operating under GSS policy, in steady state. 
Then there exists some constant C > 0, not depending on r, 
such that 

P (d(a;'"(oo), X*) < Cr"-') 1 

as r — >■ oo. Consequently, we have fluid-scale asymptotic 
optimality: 

d{x'{oo),X*) =^ 0. 

Theorem 7. Let p e (5,1). For each r, consider the 
open system operating under GSS-M policy, in steady state. 
Then there exists some constant C > 0, not depending on r, 
such that as r 00, 

F {d{x'' {00), X*) < Cr"-^) ^1, (6) 

and 

r-^^y,^ (00) ^ 0. (7) 

i 

Consequently, we have fluid-scale asymptotic optimality: 
d{x''{oo),X') and r'^^Y[ {00) => 0. 

i 

3. CLOSED SYSTEM: ASYMPTOTIC 
OPTIMALITY OF GSS 

We restrict our attention to closed systems and prove The- 
orem [U] in this section. As mentioned earlier, it is not suf- 
ficient to consider only the system states at the fluid scale, 
defined in Section 12.21 We also need the concept of local 
fluid scaling, introduced below. Proposition [9] - a key step 
in the proof of Theorem [6]- is established in Section [3.21 In 
Section [3.31 we construct an appropriate probability space, 
quantify the drift of F^ under CSS (cf. Propositions 1141 and 
I15|l . and prove Theorem [B] 

3.1 Local Fluid Scaling 

Besides the fiuid-scaled processes x^{t) defined in Section 
12.21 it is also convenient to consider the system dynamics at 
the local fluid scale. More precisely, for each r and t, define 



the corresponding local fluid scale process x^{t) by 

In the asymptotic regime r — > co, recall that the fiuid scale 
process x^{-) always lives in the compact set X (defined in 
Section [2.2^ . This is no longer true for the local fiuid scale 
processes x^{-): for a fixed t, {x^{t)}r can be unbounded. 
However, at the local fluid scale, we will always consider the 
following weight function w, which remains bounded. 

Define the local-fiuid-scale weight function w : RU{oo} — >■ 
R+ to be w{x) = 1 Ax. By convention, 1 < cxj, so w; is well- 
defined. Note that for every r, w{x'^) = w^{X^), where 
x^ — X"^ /r^ . For (fc, i) G 7V(, we can also define the weight 
difference at the local fiuid scale to be 

A(fe,i)(5) = w{Xk) - w{Xk-ei)- 

Remark. In the sequel, we will always use lower case x (or 
a;) to denote quantities at the fluid scale, x (or x) to denote 
quantities at the local fluid scale, and upper case X (or X) 
to denote quantities without scaling. 

3.2 Key Proposition 

For a vector x G (R+ U {00})''^' with components being 
possibly infinite, we can define the concept of a Strictly Im- 
proving (SI) pair associated with x. 

Definition 8 (Strictly Improving (SI) pair). For 
{k,i), {k',i) £ A4, {{k,i), {k' , i)} is an SI pair associated 
with X if 

(a) ki> 1, Xk > 0; 

(h) either k' = ei, or [k'^ > and x^i^^. > 0]; and 

(c) ^(k',i) < ^(k,i)- 

The idea of SI pairs is as follows. Suppose that the current 
system state is X^ , and a type-i customer just completed its 
service requirement at a server with configuration fe. Then 
the first-order change in is —A^i^^^{X^). Suppose that 
this customer is then placed into a server with configura- 
tion fc', under CSS. Then, the total (first-order) change 
in after this transition is A[j,/^)(X'') — A'(^ ^^{X^), or 
A(fc/_i) (aj"^) — A(fc_i)(a;'^). The existence of an SI pair ensures 
that we can always improve (up to first order) the current 
value of F^. 

Recall that for any feasible system state X^ , x^ — Jf/r 
denotes the fluid-scale system state, and x^ = X^ /r'' de- 
notes the associated state at the local fiuid scale. The follow- 
ing proposition establishes that whenever x^ is sufficiently 
far away from optimality, an SI pair exists. 

Proposition 9. Let D > be the same as in Lemma\^ 
Then, there exist a positive constant e such that the follow- 
ing holds. For sufficiently large r , if d{x^ , X*) > 2D\JC\r''~^ , 
then there exists an SI pair {(fc', j), {k,i)} (possibly depend- 
ing on r) associated with x^ = {Xk)keic, and furthermore, 
Xk > e, Sfc'_e, > <^nd A(fe/,i)(5'') - A(fe,i)(x'') < -e. 

Proposition |9] follows from the two lemmas below. 

Lemma 10. Consider any sequence {a;""} and the asso- 
ciated states x^ . Let x £ X be a limit point of the se- 
quence {a;''}, so that the the subsequence {r„} of{r} satisfies 
a;'^" — >■ X and x^"' —^5 as n 00, with some components 
of X being possibly infinite. If there is no SI pair associated 
with X, then x £ X* , i.e. x is an optimal solution of LP. 



Proof of Lemma [TU] Suppose that there is no SI pair 
associated with x. We will show that x £ X* , i.e., x is an 
optimal solution of the linear program LP. To this end, we 
will use Lemma (2] In particular, we will construct rji > 0, 
i £T such that 

(0 X^igi^i'?* — 1 fo'^ k £lC, and 

(ii) if < 1, then Xk < 1. 

Note that condition (ii) here is stronger than condition (ii) 
in Lemma [2l 

Let rji = w{xei) for all i £ I. Then clearly -qi £ [0, 1] for 
all i £ I. We first show that condition (i) holds. To this 
end, we prove the following stronger statement: if A; G is 
such that ki > 1 implies rji > 0, then X^igi^''?' = w{xk)- 
Suppose not. Let fc G /C be a minimal counterexample, so 
that 



fc,77, w{xk) 



(8) 



and for each i € I, fci > 1 implies r/i > 0. Note that 
X^igi — 2, since -qi — w(5;e;) for each i G X, by definition. 
Thus, there exists i G X such that rji > 0, k' = k — Si £ IC, 
and 

"^k'.rji ^ w{xk'). (9) 

Subtracting Eq. ((9)1 from Eq. (j8]), we get that 

A(fc,i) = w{xk) - w{xk') / rji. 

Thus either A^k,x) > Vi, or A^k,i) < Vi- K ^{fc,i) > Vj., 
we verify that {{k, i), (e;, i)} is an SI pair associated with x. 
First, conditions (b) and (c) in Definition|5]are automatically 
satisfied. Second, A(^k,i) > ?yi > 0. In particular, Xk > 0. 
We also have fc; > 1, so condition (a) in Definition [S] is also 
satisfied. 

If A(fe_i) < rji, we verify that {(e;, i), (fc, i)} is an SI pair 
associated with x. First, condition (c) in Definition [S] is 
automatically satisfied. Second, since rji > 0, aJg. > 0. Thus 
condition (a) in Definition [S] is satisfied. Finally, ki > 1 
by assumption, so to verify condition (b), we only need to 
verify that Xk-e, > 0. Since fci > 2, X^jg^fc- > 1. 

This implies that there exists i' £l such that k[, > 1. Thus 
ki> > k[, > 1, so r)i> > 0. By Eq. w{xk') > Vi' > 0, so 
Xk' > 0. Thus, condition (b) in Definition |5] is verified. 

In either case, we have an SI pair associated with x, con- 
tradicting the assumption that there is no SI pair associated 
with X. Thus, for all fe G such that ki > 1 implies r]i > 0, 

kiTji = w{xk). 

iex 

For all fe G /C, we can find k' < k such that k' £ IC, k'i > 1 
implies 77, > 0, and I],gi = H.aiK'ni- Thus, 

hr)i = y^ k'iT)i = w{xk') < 1. 
iex iex 

This establishes condition (i). 

We now establish condition (ii). Suppose that condition 
(ii) does not hold. Let fe G /C be minimal such that 



First, note that k 7^ e; for any i G I, because if r^i < 1, then 

1> 1]i = w{Xei ) = 1 A Xe. . 

Thus Xligi ki ^ 2. Second, if 77^ > for all i G I with 
ki > 1, then from the proof of condition (i), we have that 

1 > y^ k^rj^ = w{xk) = 1 AXk, 
iex 

so we have Xk < 1, reaching a contradiction. Thus, there 
exists i £ I such that rji — and ki > 1. Let k' — k — e;. 
Then k' £ K., since 

Y.K = Y.ki-i>i. 

iex iex 



Since rji — Q, 



'^Kni = ^kir], < 1. 



iex 



iex 



By minimality of k, we must have Xk' < 1. Thus, w{xk') = 
1 A Xk' < 1, and w{xk) = 1 A iZfc = 1. This implies that 

A(fe,i) > = 

and that {(fc, i), (e^, i)} is an SI pair associated with x. This 
is a contradiction, so condition (ii) is established. □ 

Lemma 11. Consider any sequence {x^} and associated 
states xT . Let x^" , x, x^" and x be the same as m Lemma 
\10l If for all sufficiently large n, d{x^''- , X") > 2D\IC\rZ~^ , 
then there is an SI pair associated with x. 

Proof of Lemma [TT] We prove the lemma by contra- 
diction. Suppose that the lemma is not true, then for suffi- 
ciently large n, d{x r^ X") > 2D\IC\rP-^, and there is no SI 
pair associated with x. By Lemma [TOl x is an optimal solu- 
tion of LP, and from the proof of Lemma [TO] rj = {rji)i^x is 
an optimal dual solution of LP, where rji = x^i for all i £l. 

For a given r, consider the following linear program, which 
we call LP''. 

Minimize Xk (10) 

heK 

subject to y^ kiXk = piV^^^ , for all i £X, (11) 

keK 

Xk > 0, for all k£lC. (12) 



Xk 



> 1, and y^ kiTji < 1. 



LP*" is just a scaled version of LP, defined in Section r2.2l For 
each r, the feasible set of LP*" is r^~''X, its set of optimal 
solutions is r^^^'X* , and its optimal value is r^^^'u* . r^~^x 
is an optimal solution of LP"", and 77 is an optimal dual 
solution. Furthermore, by LemmaO for sufficiently large n, 

— 1 — p* 1 — p/ \ ^ Til 

Xk ~r = r- I 2^ a^fe - ' 

keic \keic 

> r^-''d{x''^,X')/D 

> r^-^ ■ {2D\K.y-^)/D>2\K.\. 

For each n, consider the Lagrangian L{x^" , r)) of LP*"" , eval- 
uated at x^" and rj: 

Lix""^ , T?) = y^ £fc" + y^ I P^rl,'^ - ^^^fe" I ■ 
keK iex \ keK J 



We calculate the Lagrangian in two ways. First, by feasi- 
bility of x^" , L{x^",r]) = X^fceK^fe"' Second, we rewrite 
L{x^" , rj) as 

iex keic \ iei / 

The first term on the RHS equals rl^^u* , by the dual op- 
timality of r). For the second term on the RHS, note that 
in the proof of Lemma 1101 we have established that for all 
fc e /C, X]ig2:fci77i < 1, and if J^i^z^^'^i < 1' ^hen Xk < 1- 
Since x^" — >■ x, for all sufficiently large n, if X^igi ^iVi < li 
then < 1. Thus for all sufficiently large n, 



tly G [0,Tr'^-P]. Then for any ^ £ [0, 1], and for any (fc, i) £ 
M, 



^ 1 - ^ hVr 

keK \ iSI 



and 



feGK 

contradicting the fact that 

J2 ^l" - r]ru > 2\K\ 

keK. 

for sufficiently large n. This establishes Lemma [TT] □ 

Proof of Proposition [9l We are now ready to prove 
Proposition |9l Suppose that the proposition does not hold. 
Then for all e > 0, there exist infinitely many r and x^ 
such that d{x'',X*) > 2L»|/C|r''~S and for aU SI pairs (if 
any) {{k' , i) , {k , i)} of x^ , either x^, < e, or xl,,_^, < e, or 
A(fc/ — A(fc i)(5'') > —e. Thus, we can find a subse- 

quence {r„} of {r} and states a;' " such that 

1. cc'" — >■ cc £ A' as n — j> 00, 

2. S'^" — j> 5 as ri — 7> c», with some components of x being 
possibly infinite, 

3. d{x''^,X*) > 2D\IC\rP-^ for all n, and 

4. for all SI pairs {{k' ,i), {k,i)} associated with x''" (if 
any), either x^" < 1/n, or < or A^^' ,i){x''") 
A(fe,i) (£'■")> -1/n. 

From Property 4, we can deduce that x does not have an SI 
pair. But by Property 3, this contradicts Lemma [TT] This 
establishes Proposition (9] □ 

3.3 Proof of Theorem |6] 

We will assume WLOG the following construction of the 
probability space. For each (fc, i) £ M, consider an indepen- 
dent unit-rate Poisson process {Il(^k,i){t), t > 0}. Assume 
that, for each r, the Markov process X'^{-) is driven by this 
common set of Poisson processes Ilffe. ;)(■), as follows. For 
each (fc,i) £ 7V(, let us denote by D^/^ ^^(t) the total number 
of type-i service completions from servers of configuration 
fe, in the time interval [0,t]. Then 



(13) 



DlkA)ii) = ^{kA) { Xk(^)ki^id^ 



1 



as r — > cxD. The convergence is uniform over t^, ^, and (fc, i) 
in the following sense. For any e > 0, there exists r{e) 
such that for all r > r{e), ^ £ [0, 1], (fc,i) £ A4, and tjj £ 
^Tr^-"], 



< e. 



The proof of Lemma [12] depends on simple large-deviation 
type estimates for Poisson random variables. The idea is es- 
sentially the same as that of Lemma 4.3 in [11]: we partition 
the interval [Q,Tr'^^~^] into subintervals of length r''~^^^, 
and for each of them write the probability that the average 
increase rate of Tl^k,i) lies outside (1 — e, 1 -I- e). These prob- 
abilities are exp (— poly(r)), and we only have poly(r) such 
subintervals (here poly(r) means a polynomial in r). This 
is true for any e > 0. We can then cover any subinterval 
of length r^^~^ by these subintervals of length r^~^^^. We 
omit a detailed proof here. 

The following corollary is a simple consequence of Lemma 
[H 

Corollary 13. Let T be fixed. With probability 1, the 
following holds. For sufficiently large r, 



max 

«e[o,i], 
t5e[o,Tri-J'] 



d {X^{tl + Cr-''-^), X'-(tS)) < 2mr\ (14) 



Lemma 12. Let T > be fixed. With probability 1, the 
following property holds. Consider any sequence {tQ}r with 



where fj, = maxigi/i;, and fit is the service rate for type-i 
customers. 

Proof. Consider the probability-1 event in Lemma [121 
in which we can and do replace T with 2flT. (We do this 
because the total "instantaneous" rate of all transitions is 
upper bounded by 2jlr.) The rate of departure of type-i 
customers is pifj,ir < pifir, and the total rate of customer 
departure is no greater than X^igxP'/^'" ~ P-^- Thus, for 
each k £ KL, the rate of change in Xj, is at most fir. For an 
interval of length r*""^, the total change in Xk is at most 
0(r ■ r^^^) = 0{r'^). More precisely, with probability 1, for 
each k £ K,, 

limsup-^ max \Xl{tl + ^r"-^) - Xl{tl)\ < fi. 

t5e[0,Trl-P] 

Thus, for sufficiently large r, and for each fc £ /C, 

max \Xl{tl+ir^~^)-Xl{tl)\<2nr^. 
«e[o,i], 
t5e[o,Tri-p] 

Summing over the above expression establishes the corol- 
lary. □ 

Proposition 14. There exist positive constants Ci and 
5 such that the following holds. Let T > be given. Then 
w.p.l, for all sufficiently large r, and for any interval [to, to + 
r^-^] C \Q,Tr^-^], if d{x''{to),X*) > Cir''-\ then 

F'-{X'-{to + r^-^)) -F^{X^{to)) < -5r^^-\ 



Proof. The proof idea is as follows. Consider the in- 
crease in _F"" at each state transition. For concreteness, sup- 
pose that the current system state is X"^ , and a type-i cus- 
tomer just completed its service requirement on a server 
with configuration fe, and is placed into a server with con- 
figuration k' . Then it is a simple calculation to see that the 
increase in is at most 

A[fc,,,)(A:'-)-A'(,,)(X'-) + 4r-^ 

The term A^^,, — Aj^, ^^(X'') captures the first-order 

increase in , and the term 4r~'' bounds the second-order 
increase in F^ . We will see that over an interval of length 
r^~^ , the increase in F^ due to first-order terms is at most 
-0(r2p-i), and the increase due to second-order terms is at 
most a constant. We now proceed to the formal proof. 

From now on, we work with the probability-1 event de- 
fined in Lemma [121 under which 

^ (n(fc,) {to+^r'''-') - n(,,) (to)) ^ c 

as r — >■ cxj, uniformly over to,^, and {k,i). Let Ci — 2{ft + 
D)\IC\, where p, = maxigi /li and D is the same as in Lemma 
13] Let e > be the same as in Proposition [5] and let 5 > 
be such that S < |^i£^ for all i £ I. 

Claim that for all sufficiently large r, and for any interval 
[tcto + r"-'] C [0,rri-P], if d (a;'- (to), A-*) > CirP-\ then 

F'-(A:'-(to-f r^-^)) -F'-(X'-(to)) < -6^^-\ 

Suppose the contrary. Then there exist a subsequence of 
{r} (which, with an abuse of notation, we still index by r), 
along which we have some [toi^o + r^~^] C [0,Tr^^''], such 
that d{x'~{t^),X'-) > CirP-\ and 

F'-{X'-{tl + r^-^))-F^{X^{fo))>-5r^^-\ (15) 

First, for sufficiently large r, and for all ^ G [0,1], there 
exists a SI pair {(fc', i), (fc, i)} associated with (tQ+(^r''^^) 
(possibly depending on r and ^), such that 

xlito + S.r'''^) > e, Xfc/„e^(tf'i -f ^T-''"^) > £, and (16) 

A(fe^.)(5'■(^5 +C^''-')) - A(;,,,,(5'-(tS + ^r"-')) < -e. 

(17) 

By CoroUaryini for all e G [Q,l], d {X^^ {tl + ^r^-"-), X-" {tl)) < 
2fl\IC\r^ . Using triangle inequality and choosing Ci > 2{p.+ 
D)\IC\, we have that for sufficiently large r, and for all ^ G 
[0, 1], 

d {x^'ifo + Cr''-'), A-*) > 2D\K:\r''-\ 

(|16|l and (|17|) now follow from Proposition [9] 

Fix a sufficiently large r so that (|16p and p7|l hold. We 
then consider the first-order change in F^ over the interval 
[to, to + r^"^] (i-e., the difference of A). To do this, we 
partition [to,io + r^~^] into subintervals of length c£r^~^ , 
with c > chosen small enough so that on each subinterval, 
there exists a fixed SI pair {(fc', i),{k,i)} such that (|16|l and 
(|17|l hold for this SI pair, and with e replaced by e/2. We 
now argue that this can be done. Consider the first such 
subinterval, for example. By Lemma lT^ for sufficiently large 
r, the number of state transitions over this subinterval is 
at most (cer''~^) • 0(r) = 0{er'') < ^er^, by choosing a 
sufficiently small c. This implies that for each k £ K,, the 
change in over this subinterval is at most Thus, 
(|16p and (|17p hold for an SI pair associated with a;''(t5). 



with e replaced by e/2. The same argument holds for other 
subintervals. 

Now concentrate on the subinterval [t^jlQ + cer^"^], and a 
corresponding SI pair {(fe',j), (fc,i)} associated with a3'^(to) 
for which (|16p and p7p hold on this subinterval with e re- 
placed by e/2. The number of type-i departures from servers 
of configuration fc is at least /^i- (cer''~^) = ^c^ie^r^^~^. 
At each such departure, the first-order increase (due to the 
difference of A) in F^ is at most —e/2, since GSS results in 
a smaller first-order increase than moving the departure to 
a server with configuration k' — ei. Summing over all such 
increases over type-i departures gives a first-order increase 
in F"" which is at most 

Exactly the same argument holds for other subintervals, so 
the total first-order increase in F^ is at most —25r^^^^. 

Finally, consider the second-order increase in F^ . As dis- 
cussed at the beginning of the proof, the second-order in- 
crease in F^ at each state transition is at most Ar~^ . For 
sufficiently large r, the total number of state transitions over 
the interval [tlXo + r^~^] is at most r^"^ • 0(r) = 0(r''), 
and hence the total second-order increase in _F"" is at most 
(4r-P) • 0{r^) = 0(1). Thus, for sufficiently large r, 

F''{X''{tl+rP-^))-F^{X^{tl)) < -25r^^^^+Oil) < -5r^^-^ . 

This contradicts (jlSp . and we have established the proposi- 
tion. □ 

Proposition 15. There exist positive constants C and T 
such that as r ^ oo, 

P (d [x^'iTr^-"), X") < Cr"-^) 1. 

Proof Sketch. The proof is very intuitive. We keep 
track of the evolution of F^ on the interval [0, Tr'^^^] subdi- 
vided into r'^^^-long subintervals. W.p.l., for all sufficiently 
large r, the following is true for each subinterval [to, to -|- 
^P-i], pr decreases by at least Sr'^"'^ if d {x''{to), X*) > 
Cir^^^ (by Proposition I14p . and it can never increase by 
more than C^r^. Therefore, if we choose T large enough, 
then d (a;^(t), X") < Cxr^~^ at some time t G [0, Tr^~'^\ (be- 
cause otherwise F'^ would become negative), and d (x^ {fy,X*) = 
0{r^~^^ thereafter. We refer the readers to Appendix iBl for 
details. □ 

Proof of Theorem [gl Theorem [6] is now a simple conse- 
quence of Proposition 1151 For each r, consider a;'^(-) in the 
stationary regime. In particular, for any T > 0, a;'^(Tr^~'') 
has the same distribution as x^ (oo). Therefore, by Proposi- 
tion [151 

^{d{x\oo),X*) < Cr"-^) ^ 1, 
as r — !> oo. This completes the proof of Theorem [51 □ 

4. OPEN SYSTEM: ASYMPTOTIC 

OPTIMALITY OF (MODIFIED) GSS 

We prove Theorem [3 in this section. The proof "extends" 
that of Theorem [6l The main additional step is Theorem [181 
which shows that in steady state, for each i £ T, Y[{t) the 
number of tokens of type-z, remains o(r^) with high proba- 
bility, over 0(r^~'')-long intervals. As a starting point, we 
need the following facts. 



Theorem 16. Consider the sequence (in r) of open sys- 
tems m steady state. Consider any fixed i. There exists a 
positive constant c such that, uniformly on all r, 

EeM\\r-'^\n{oo) - p.r, f/-(oo))||} < c. 
Proof. See Appendix[C] □ 

For our purposes, the following corollary will suffice. 

Corollary 17. Consider the sequence (in r) of open sys- 
tems in steady state. Consider any fixed i. Then, for any 
q > 1/2, 



Pir,Y[ {co))\\ 



0. 



Next we show that the property of Corollary [17] holds 
not just at a given time, but uniformly on a 0(r^^')-long 
interval. 

Theorem 18. Consider the sequence (in r) of open sys- 
tems in stationary regime. Consider any fixed i. Let q> 1/2 
and T > be fixed. Then, as r ca, 

sup \\r-''{Y:{t)-p,r,Y:{t))\\^0, (18) 



and, consequently, 



sup r 

te[0,Trl-<!] 



\Z^{t) 



(19) 



Clearly, the statement of Theorem [T5] is equivalent to the 
following one: Any subsequence of {r} contains a further 
subsequence along which w.p.l. 



sup \\r-^Y:{t)-p,r,Y:m\ 

te[0,Trl-<!] 



0, 



and then 



sup r-''\\Z''{t)-r\\^0. 

tG[0,Trl-9l 



(20) 



(21) 



In turn, to prove the latter statement it suffices to show 
that there exists a construction of the underlying probability 
space, for which the statement holds. 

We will need some estimates, which can be obtained from 
a strong approximation of Poisson processes, available in, 
for example, [S] Chapters 1 and 2]: 

Proposition 19. A unit rate Poisson process n(-) and 
a standard Brownian motion W{-) can be constructed on a 
common probability space in such a way that the following 
holds. For some fixed positive constants Ci, C2, C3, such 
that VT > 1 andW> 



(sup \n{t)-t-W{t)\>CilogT-^u] 

\0<t<T ) 



If in the above statement we replace T with rT , and u with 
r^''^, we obtain 

P( sup l(n(t)-f)-VK(f)i <Cilog(rr) + r'''M 

\a<t<rT ) 



> 1 - C2e 



(22) 



Note also that for a fixed S £ {0,q — 1/2) and all large r, 

^ sup |W(t)| <r^/2+*^ > l-e"""'' (23) 

^0<t<rT / 



for some constant c > 0. If events in (|22p and (|23p hold for 
all large r, then 



sup r'''\U{t)-t\ 

Q<t<rT 



0. 



(24) 



To prove Theorem 1 181 consider the following construction 
of the probability space. (We want to strongly emphasize 
that this construction will be used only for the purpose of 
proving Theorem ! 181 For the proof of Theorem[71 we can and 
will use a different probability space construction.) For each 
r, we divide the time interval [0, Tr^"'] into r^~'' of T-long 
subintervals, namely [{m—l)T, mT] with m = 1, 2, . . . , r^~^. 
In each of the subintervals, and for each r, we consider in- 
dependent unit rate Poisson processes Hp™, IIp™, ftp"', 
driving type i exogenous arrivals, actual customer depar- 
tures and token departures, respectively. More precisely, the 
number of type i exogenous arrivals, actual customer depar- 
tures and token departures, by time t from the beginning of 
the m-th interval is given by 

nr(A,ri), firl^l^^nm^y hr(^j\.Yam 

respectively. Using (|22I) -H24 | ) we obtain the following prop- 
erty for n^'™ (and analogous ones for ft-'™ and Hp'"): 

max max |n'''"(f) — tl/r'' — >■ 0, as r — >■ 00, w.p.l. 

l<m<rl-9 0<t<rT 

(25) 

We denote 

g^{t) = m{t),yl{t)) = r-^Y^it) - p.r,Y[(t)). 
Then, we can prove the following. 

Lemma 20. Consider fixed realizations (for each r) of driv- 
ing processes, such that the properties h25\l hold with q re- 
placed by a smaller parameter q £ (1/2, g). Consider the 
corresponding sequence of realizations of{g^{t), t>0), with 
bounded initial states ||9'^(0)|| < e, e > 0. Then, there exists 
a subsequence of r along which 



9''it)^9{t), u.o.c, 



(26) 



where {g{t), t > 0) is Lipschitz continuous, with \\g{0)\\ < e, 
and it satisfies conditions 



{d/dt)yi{t) = -fiii/iit), 



(27) 



{d/dt)yi{t) 



iJ-iViit) - i^omit), \fyi{t) > 

ms.x{0, p.iyi{t) - p.oyi{t)}, if yi{t) =0 

(28) 

at points t>Q, where the derivatives exist (which is almost 
everywhere w.r.t. the Lebesgue mesure). Moreover, the con- 
vergence 



\\g{t)\\^0, t^oo, 



(29) 



holds and is uniform w.r.t. initial states with ||g(0)|| < e, 
and 



sup max ||.g(t)|| — >■ 0, e — >■ 0. 

||9(0)||<6 t>o 



(30) 

As a consequence of fSOV . 

113(0)11=0 implies \\g{t)\\ = 0, Wt. (31) 

Lemma [2D] is analogous to Lemma 14 in [TJ, except that 
the space scaling by r^' is applied, as opposed to the fluid 



scaling by r~^, and the number of actual customers Y{{t) 
is centered before scaling. The proof is somewhat more in- 
volved - the main issue is that (unlike for the fluid limit) 
the Lipschitz property of the limit is no longer automatic, 
because the rates of arrivals and departures in the system 
are 0{r), while the space is only scaled down by r'. (That 
is why we need to use properties (|25|l , as opposed to simply 
a strong law of large numbers.) However, this issue can be 
resolved as in, for example, the proof of Theorem 23 in [13j . 
We omit a detailed proof. 

Proof of Theorem 1181 By Corollary 1171 we can choose a 
subsequence of r (increasing sufficiently fast) so that 

||3'-(0)|| ^0, w.p.l. 

Then, we use the construction of the probability space spec- 
ified above, which guarantees that w.p.l the properties H25p 
hold with q replaced by a smaller parameter q' G {1/2, q) - 
let us consider any element of the probability space for which 
the properties (|25|) do hold. We claim that, for this element, 
(|20[) holds. Suppose not. Then, there exists e > and a fur- 
ther subsequence of r, along which t*^ — mm{t \ \\g^{t)\\ > 
e} < Tr^-". By Lemma [20l we can and do choose time 
duration Ti > such that any limit trajectory g{t) with 
Il5(0)|| < e satisfies ||g(7i)|| < e/2. For each r, consider the 
trajectory of on the time interval [r^ — Ti, r*"]. (Suppose 
for now that > T\ for all sufficiently large r.) Then we 
can choose a further subsequence of r along which g^ij"^ — 
Ti -\-t) ^ g{t) uniformly for t G [0, Ti], for a limit function 
g{t) as in Lemma [20l But, this is impossible because then 
\\9iTi)\\ < e/2. The case when r'' < Ti for 
infinitely many r is even simpler: we choose a further subse- 
quence along which this is true, and consider the trajectories 
of g^ on the fixed time interval [0, Ti]. In this case any limit 
trajectory g{t) described in Lemma [20l stays at in the en- 
tire interval [0,Ti], because ||5(0)|| = lim^ ||5''(0)|| = 0. This 
means that ||(7'^(t'')|| — >■ 0, again a contradiction. □ 
From this point on, we assume the following structure 
of the probability space. (It is different from the one used 
for the proof of Theorem 1181 which, as we discussed, was 
for that proof only.) There are common (for all r) unit rate 
Poisson processes driving the system, defined as follows. For 
each (fe, i) £ A4 and k < k, consider independent unit-rate 
Poisson process 11^^, j.) ^(t), t > 0, so that the number of ac- 
tual type i customer departures from configuration (fe, fc) in 

the interval [0,t] is equal to fl(^k,k).i (/o A'»^»"'^(fe,fc) (O^^C) • 
Similarly, consider independent unit-rate Poisson process 
{^(h fe) i > oj", so that the number of type i token de- 
partures from configuration {k,k) due to their expiration, 
is equal to ii(^k,k),i (/o t^o{ki - ki)X'^^j^^(^)d{j. Finally, for 
each i £ X, let {Ili{t), t > 0} he an independent unit-rate 
Poisson process, such that the number of exogenous type i 
arrivals in [0,t] is equal to ni(Airi). For a fixed parameter 
T > 0, whose value will be chosen later, each of the above 
Poisson processes satisfies Lemma [T^ in which we can and 
do replace T with 2T[{fi V fio) + J2i ^i]- (We do this be- 
cause we will "work" with system sample paths such that 
12i ~ 'll.iO^i^ + ^i^) < 2r, and for these sample paths the 
total "instantaneous" rate of all transitions is upper bounded 

by 2r[(/iV/.o)+EiA«]-) 



Denote by Z)[(fi,t2) the number of type-i token depar- 
tures (due to their expirations), and by A**'^ {ti,t2) the to- 
tal number of exogenous type-i arrivals (of actual customers) 
that do not replace type-i tokens, all in the interval (ti,t2]. 
Also, denote Yl{ti,t2) = ¥[(12) - Yl'iti). 

Theorem 21. Consider the sequence (in r) of open sys- 
tems in stationary regime. Let T > be fixed. Then, any 
subsequence of r contains a further subsequence such that, 
w.p.l, the following holds: 

DUto,to + r^-^)/[r^rP-^]^0, (32) 

ir'''(io, to + r^-^)/[r^r^-'] ^ 0, (33) 

uniformly on all intervals [to, to + r^^^] C [0,Tr^^^]. 

Proof. Indeed, by Theorem 1 181 we can and do choose a 
subsequence of r along which (|20p - (|21| ) hold w.p.l. Then, 
(|32[) follows from (|20|) . which states that the number of to- 
kens Yl{t) is uniformly o{r''), and from the construction of 
the token departure processes, with the corresponding driv- 
ing processes 11^^,^,^^ satisfying Lemma [121 From H20[) we 
also have the uniform convergence 

Y:-ito,to + r^-^)/[r^r^-^]^0. 

But, this along with p2[) implies uniform convergence (|33[l 
as well, because we have the conservation law 

Y[{to,to + r^~^) = A**-^ito, to + r"-^) - Dlito, to + r^-^). 

The theorem is then proved. □ 

Proof of Theorem [7j Consider the sequence of the system 
processes in stationary regime. Consider a fixed T > 0, 
chosen to be sufficiently large, as in Proposition llSI Consider 
any subsequence of r. Then, we can and do choose a further 
subsequence of r along which, w.p.l, H20p - (|21[ ) hold with 
some q £ (1/2, p) (by Theorem II 8p , and the properties 
stated in Theorem l21l hold. As in the proof of Proposition llSI 
we will keep track of the evolution of the value of F"^ {X^ (t) ) . 
We emphasize that this is exactly the same function as 
defined in Section 12.31 and used in the analysis of closed 
system, namely it has the fixed parameter r (in the system 
with index r), and not the random "parameter" . We 
claim that the following property holds. 

Claim: There exist positive constants < Ci < C2, S > 
0, such that the following holds. For all sufficiently large 
r, uniformly on all intervals [to, to + r'''~^] C [0,Tr^~^], we 
have (a) F''{X''{to)) - ru* > Cir" implies 

F'-(A:'-(fo + r^-')) - F'-(A:'-(fo)) < -Sr^"'', 

and (b) F''{X'\to)) - ru < Cir" implies 

sup F''{X''{to + ^r"-^)) - ru < Car''. 

Clearly, (b) is analogous to Corollary [TJ] for the closed sys- 
tem and is proved exactly same way, with p, in (|14p replaced 
by /i V/iQ. Statement (a) is analogous to Proposition 1 141 for 
the closed system, and we prove it below. It is also clear 
that the claim, along with (|20p - (|2ip . implies the theorem 
statement via the argument almost verbatim repeating that 
in the proof of Proposition 1151 

It remains to prove (a). The proof is the same as that 
of Proposition 1141 except that we have to make additional 



estimates accounting for: (i) token departures due to their 
expiration and actual customer arrivals that do not find to- 
kens; (ii) the fact that GSS-M uses weight function w'" = 
{X; Z^), as opposed to function = 'W^{X) (which 
has constant r as a parameter, instead of the random vari- 
able Z^). This is because, if we would have only transi- 
tions associated with actual customer departures and actual 
customer arrivals replacing tokens, and the assignment deci- 
sions would be based on weight as opposed to w'" , then ex- 
actly the same drift estimates as those in the proof of Propo- 
sition [14] would apply. Note that in (i) we consider exactly 
those transitions for which we have properties H32p - (|33p . 
Therefore, in any interval [to,to-irr''^^] the "worst case" pos- 
sible increase in F^{X^) due to such transitions is o{r^^~^). 
(We omit obvious epsilon/delta formalities.) Now consider 
(ii). Since we have the uniform bound \Z^ (t) — r\ < Oij-''), 
it is easy to check that \w'' [X) - w''{X)\ < 0(r«-^) for 
any X > 0. This means that the error in the calcula- 
tion of first-order contribution into the change of F^{X^) 
in any [to, to -f r^^^], introduced by GSS-M using weight 
instead of w'~ , is uniformly bounded by 0{rr''~^r'^~^) — 
(^(•^p+g-i-j _ o(7-2p-i) (Again, we omit epsilon/delta for- 
malities.) We see that the potential positive contribution of 
both (i) and (ii) into the change of objective function in any 
interval [to,to -'rr^~^] is o(r^''~^), uniformly on the choice of 
the interval. The estimate in (a) follows. Thus, the proof of 
the above claim, and of the theorem, follows. □ 

5. DISCUSSION 

We presented the policy Greedy with sublmear Safety Stocks 
( GSS) along with a variant, which asymptotically minimize 
the steady-state total number of occupied servers at the fluid 
scale, as the input flow rates grow to infinity. A techni- 
cal novelty of GSS is that it automatically creates non-zero 
safety stocks, sublinear m the system "size", at server con- 
figurations which have zero stocks on the fiuid scale. It is 
important to note that the algorithm does it without a pri- 
on knowledge of system parameters. To prove the fluid-scale 
optimality of GSS, we also need to consider a local fiuid scal- 
ing, under which the sublinear safety stocks are "visible". 
This in turn allows us to obtain a tight asymptotic char- 
acterization of the algorithm deviation from exact optimal 
packing. 

We can extend GSS to policies that asymptotically min- 
imize the more general objective X^fe'^fe^fei where Cfc > 
can be interpreted as the "cost" (for example, some esti- 
mated energy cost) of keeping a server in configuration fc, 
for each k £ K.. Instead of the weight function w^{Xj,) for 
each k £ K,, consider the weight function CkW^{Xl^), and 
define A*" as the difference between the new weight func- 
tions. We can then define GSS and GSS-M using the new 
A"". They minimize the fluid scale quantity J]]^, CkXk asymp- 
totically, and similar convergence rates can be obtained. If 
we assume that the cost Ch is monotonically non-decreasing 
in fc (i.e., c^' < Ch if k' < k), then all our results and proofs 
still hold essentially verbatim. If costs Ck are not monotone 
in fc, most of the statements and proofs easily extend, ex- 
cept those of Lemmas IIUI and [TTl where some dual variables 
rii may need to be negative. These r/i can be defined in a 
similar fashion as those in the proof of Lemma 6 in [13] . 

There are some possible directions for future research. For 
example, one may expect asymptotic optimality of "pure" 
GSS in an open system, which seems more difficult to es- 



tablish. Proving or disproving its optimality may require 
better understanding of and some new insight into the sys- 
tem dynamics. Another direction can be the investigation 
of policies other (possibly simpler) than GSS. GSS is asymp- 
totically optimal as the system scale increases. However, if 
the number \IC\ of feasible configurations is large, the sys- 
tem scale may need to be very large for the near optimal 
performance. It is then of interest to design policies (e.g., 
some form of best-fit) that have provably good performance 
properties at a wide range of system scales. 
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APPENDIX 

A. PROOF OF LEMMA |3] 

Both X and X* are convex and compact polytopes with 
a finite number of extreme points. Let S and S* be the 
set of extreme points of X and of X* , respectively. Note 
that for all x* £ 5*, ^j^xl. = u*, and for all x' € S\S*, 
Ste a;^ > It* + 5. for some 5 > 0. 

Let (5\5*) be the convex hull of the set S\S* . Then 
for all x' G (S\S*), "^^k^'k ^ u* + 5. Consider the func- 
tion g : {S\S*) X X* R defined by g{x*,x') = ||a;* — 
'^'ll/ (X^feeK ^fe ~ Function g is well-defined, always 

positive and clearly continuous. Since both {S\S*) and X* 
are compact, so is their product space. Thus there exists 
D > such that g is upper bounded by D. 

For every x £ X, there exists A G [0, 1] such that x = 
\x' + X)x*, with x' G {S\S*) and x* G X* . Then 



Thus, 



d{x,X*) < ||cc — a;* II = A||a;' — cc* I 



VfeeK / \heK J 

B. PROOF OF PROPOSITION [m 

Let 5 > be the same as in Proposition 1141 and define 
T = 3/S. C > will be chosen to be sufficiently large, whose 
value will be determined later in the proof. Clearly, to prove 
the proposition, it suffices to prove a stronger property 

¥{d{x''(Tr^-P),X*) < Cr-P'^ for all large r) = 1. 

By Proposition 1141 there exists Ci > such that w.p.l, 
for sufficiently large r, and for any interval [to, to + r^^^] C 
[0,Tr^-P], iid{x''{to),X') > CirP-\ then 



2p-l 



(34) 



We pick some r such that the above statement holds, and 
that furthermore, for every toG [0,Tr'-f] andCe [0,1], 



d{X'-{tl + ^r^-'),X'-{tl))<0{rn. 



(35) 



This can be done by Corollary [TS] 

Now claim that d{x'' {Tr^'"), X") < Cr"''^ . To estab- 
lish the claim, we consider the set C — {£ £ Z+ : lr^'~^ G 
[0, Tr^^P]}, and prove that 

(a) there exists^ G £ suchthat d(a;''(^or''"^). A:'*) < Cir^~^ , 
and 

(b) there exists C2 > such that for all £ G £ with l> lo, 

{X'' {(.r''-"-)) < ru* + C2r'' . 

First suppose that (a) does not hold. Then for every I £ C, 
d{x''{£rP-'^),X*) > CirP-\ so 

F''{X''{{e+l)r''-^)) - F''{X''{er''-^)) < -Sr'"'-\ 

Let I — [Tr^'^^''^] . Summing these inequalities over £, we 
obtain 

F''{X''{lr''-^)) -F''{X''{0)) < -ISr^"-^ 
< -{Tr^'-^-i''^ - l)5r^P-'' = -TSr + Sr^''-\ 



f '■(X''(£rf~')) < F'"(X'"(0)) - TSr + 5r^^-^ 



< r - %5r + 5r'^^-^ < 0. 


This contradicts the nonnegativity of F^ , so statement (a) 
is established. 

To establish statement (b), we use the following simple 
lemma, whose proof is omitted. 

Lemma 22. Let K, a and P be given positive constants. 
Consider a sequence of real numbers {an} that satisfies: (i) 
ao < K , (a) On+i — On < oe, and (Hi) if a„ > K , then 
On+i — a„ < — /?. Then max„ On < K + a. 

We will establish the following corresponding statements: 

(i) F'' {X''{£orP-^)) < ru' + Cir^. Recall that we have 
d{x''{£orP-^),X*) < CirP-\ so by Lemmag] 

F'' {X^'ilor"-^)) -ru* <Y, ^fe(4r^"') - ru* 

< rd {x''{£or''~^),X*) < Cir". 

(ii) There exists C3 > such that F'' {X''{{£ + l)r^"^)) - 
F'' {X''(£r''-^)) < Car^. This is clear, since by LemmaH 
F''(X'') differs from J2k^k by 0(rf), and the change in 
X'^ is at most O(r^) over an interval of length r^~''. 

(iii) If F'' {X''{£rP^^)) > ru* + Cir^, then 

F''{X''{{£+l)r''-^)) - F''{X''{£r''-^)) < -5r^''-\ 

To see this, suppose that F'' {X'' {£r''-^)) > ru* + Cir" . 
Thend{x^{£r''-'),X*) > j:^^,^xl-u* > ii^'' {X^£r''-'))- 
u* > Cir''~^ , and we must have 

F''{X''{{£+l)r''-^)) - F''{X''{£r''-^)) < ~Sr^''-\ 

By Lemma [22I for all £ £ C with ^ > £o, we have 

F''{X''{£r''-^)) < ru* + (Ci + C3) r" = ru* + Cir" , 

by letting C2 = Ci + C3. This establishes statement (b). In 
particular, for £= [Tr^^^-P'], 

F''(Jf(Zr''-')) < ru* +C2r''. 

Now by dSSl), the difference between X' (rr^~f ) and X^Ir^"'^) 
is O(r^). Furthermore, the difference between F^ (^X^ (£r''^^)) 
and X^'ilrP-^) also 0{rP). This implies that 

J2 XliTr^-") - ru* < Chr" + Oir"). 

keK. 

Thus, there exists C > such that 



^xl{Tr^ '')~u*<^r^ \ 



By Lemma |3] 



dix'iTr^-''), X*)<d(y, ^UTr^'") - u*\ 
VfeeK / 

and we have established the claim. Therefore, w.p.l, 

d{x''{Tr^-^),X*) < Cr^-P, 
for all sufficiently large r. This establishes the proposition. 



C. PROOF OF THEOREM M 

The general approach of the proof is similar to that of The- 
orem 2 (ii) in [S], in that it is based on the process generator 
estimates for the exponent e*, where $ is a function on the 
state space. However, the function $ in our case is much 
different, and so are the specifics of the estimates. Consider 
fixed i £T and r. For notational convenience, we drop the 
subscript i and superscript r from all quantities considered 
in this proof. The Markov chain [/(•) = {¥{■),¥{■)) has 
infinitesimal transition rate matrix ^ given by 



Ar, if w = (1, -f ■ l{y>o}), 

^ly,^ ifv = (-l,l), 

^02/, if v = (0,-1), 

0, otherwise, 



where u = {y,y). We consider A the infinitesimal generator 
of the Markov chain {/(•), defined by 



AG{u) ^Y.^{u,u') (G(w') - G{u)) , 



(36) 



for all functions G : — >■ R in the domain of A. We also 



consider the formal operator A, defined (similar to Eq. (f36l 
by 



AG{u) ^Y.^{u,u') (G(w') - G{u)) , 



(37) 



for all functions G : — >■ R. Similarly to [5j, it is easy 
to observe that the following property holds: if a function 
G takes a fixed constant value on the entire state space, 
except maybe a finite subset, then G is within the domain 
of A, AG = AG, and moreover 



E[AG(i7)] = ¥.[AG{U)] = 0, 



(38) 



where the expectation is taken w.r.t the stationary distribu- 
tion of the Markov chain [/(•). 

First, define the (candidate) Lyapunov function G : li\ — >■ 
K by 

G(n) = exp (-^h{u)j , 



where h{u) = ^{y — pr)^ + ^V^- Note that, for an arbi- 
trary 6 > 0, the truncated function 



G'*' (u) = exp 



h{u) 



Ab 



is constant outside a finite subset and therefore, by (|38p . 

E[AG^''\U)] ^ 0. (39) 

Also note that, 

Ag'^''\u) < AG(u), iih{u)/^ <h, 

^G*'''(u)<0, if /i(u)/V^> fe. 

Similar to [5j, the following inequality can be derived, us- 
ing Taylor expansion. There exists some constant C2 > 
such that for sufficiently large r, 

AG{u) < G(n) (^-LAh{u) + Sl(^Xr + fiy + ^oy)) . (40) 
The term Ah{u) captures the first-order change in G{u), 



and 



C2G(") 



Here we used the fact that h is Lipschitz continuous and 
||n|| is changed by at most 1 by any single transition. Now 
consider the term Ah{u). We use the following inequality 
to bound Ah{u): 

ax + by + + b^ 



y/{x + a)2 + {y + 6)2 - ^x^ + j/2 < 
To verify this inequality, note that first, 
(^^{x + a)^ + {y + byy < ^x/^MV + 
and second, 



ax + by + + 6^ 



Thus, 



Ah{u) < 



i-T, t; ax + by + a'^ + b^ 

-Jx^ + y2 + / - > 0. 



(Ar - [iy){y - pr) - (Ar - + ^oy)(/ioy/^) 
\/(y - prf +pop/p 
c:i{\r + pi) + pay) 



+ 



4- 



^1{y-pr?^^y'-'^{y-pr + yf 
^{y- pry + poy^/p 
C3(Ar + py + pay) 



^(y - pr-y + pay^p 



^ -1iy-pr) -^y ^ C3{Xr + py + poy) 



h{u) 



h(u) 



< ~ C4h{u) A —{Xr-\-fiy-\'fiQy), 



(41) 



(Ar + py + poy) bounds the second-order change. 



for some positive constants C3 and C4, and when h{u) > ^/r. 
Combining Inequalities (|40|) and (|41|l . we have 

AGiu) < G(n) f-ilh(n) + ^2_t£l(Ar + py + poy) 

Consider the term in the bracket on the RHS. It is now an 
elementary calculation to see that there exists some positive 
constant C5, such that whenever h{u) > cs-^r, 

— ^h{u) + ^^2-i^(Ar + py + poy) < -1- 
\/r r 

Also note that when h{u) < csy'r, the maximum values of 

G{u) and G(u) ( —-^h{u) + ^^^^-^^-^(\r + py + poy) 
\ Vr r 

are both bounded above by an absolute constant, say ce, 
which does not depend on r. In summary, 

AG(u) < — G(u) whenever h(u) > csy^, 
and AG{u) < cq whenever h{u) < csyf. 
Thus, for any 6 > C5, 
= E[AG^'\U)] < E[^G(i7)l{,,^<^t/)<b^}l 
+E[AG{U)1^,,^u)<c,V7}] 
< — E[G(L'')l{c5^<;i([/)<i)^}] + C6- 

This implies that E[G(f7)l{c5 v^<h(c/)<(,yF}] < ce, and then 
E[G([/)1 

{h(u)<b^}] ^ 2c6. Finally, by Monotone Conver- 
gence, E[G(f7)] < 2c6. This completes the proof. 



